{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 神经网络的反向传播"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导数与偏导数\n",
    "\n",
    "导数是描述函数变化率的概念。设$y=f(x)$,则$y$对$x$的导数定义为:\n",
    "\n",
    "$$f'(x)=\\lim_{h \\to 0}\\frac{f(x+h)-f(x)}{h}$$\n",
    "\n",
    "其中$h$是$x$的一个小量,当$h$趋于0时,函数$f(x)$在$x$处的变化率即为导数$f'(x)$。\n",
    "\n",
    "偏导数是多变量函数的导数概念。设$z=f(x,y)$,则$z$对$x$的偏导数定义为:\n",
    "\n",
    "$$\\frac{\\partial z}{\\partial x}=\\lim_{h \\to 0}\\frac{f(x+h,y)-f(x,y)}{h}$$\n",
    "\n",
    "类似地,$z$对$y$的偏导数为:\n",
    "\n",
    "$$\\frac{\\partial z}{\\partial y}=\\lim_{h \\to 0}\\frac{f(x,y+h)-f(x,y)}{h}$$\n",
    "\n",
    "偏导数描述了函数中一个自变量对函数值的影响,而其它自变量保持不变。\n",
    "\n",
    "导数和偏导数都用来描述函数的变化率,但导数针对单变量函数,偏导数用于多变量函数。两者公式形式很相似,都是求函数在某点沿某个方向的变化率极限。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"20.png\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from IPython.display import Image\n",
    "Image(url= \"20.png\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"16.png\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url= \"16.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 反向传播中的链式求导\n",
    "\n",
    "反向传播(Backpropagation)是训练神经网络的重要算法,它的计算过程和原理可以用以下公式表达:\n",
    "\n",
    "我们假设一个简单的多层前馈神经网络,包含输入层x,隐藏层a,输出层y,以及对应的权重矩阵W和偏置b。\n",
    "\n",
    "那么这个网络的前向传播可以表示为:\n",
    "\n",
    "隐藏层:\n",
    "$$ z^{(1)} = W^{(1)} x + b^{(1)} $$ \n",
    "$$ a^{(1)} = f(z^{(1)}) $$\n",
    "\n",
    "输出层:\n",
    "$$ z^{(2)} = W^{(2)} a^{(1)} + b^{(2)} $$\n",
    "$$ y = f(z^{(2)}) $$\n",
    "\n",
    "其中f表示激活函数。\n",
    "\n",
    "在反向传播中,我们以损失函数L为参考,计算每个变量对L的梯度贡献:\n",
    "\n",
    "1. 输出层权重的梯度:\n",
    "$$ \\frac{\\partial L}{\\partial W^{(2)}} = \\frac{\\partial L}{\\partial y} \\frac{\\partial y}{\\partial z^{(2)}} \\frac{\\partial z^{(2)}}{\\partial W^{(2)}} $$\n",
    "\n",
    "2. 输出层偏置的梯度:  \n",
    "$$ \\frac{\\partial L}{\\partial b^{(2)}} = \\frac{\\partial L}{\\partial y} \\frac{\\partial y}{\\partial z^{(2)}} \\frac{\\partial z^{(2)}}{\\partial b^{(2)}} $$\n",
    "\n",
    "3. 隐藏层权重的梯度:\n",
    "$$ \\frac{\\partial L}{\\partial W^{(1)}} = \\frac{\\partial L}{\\partial z^{(2)}} \\frac{\\partial z^{(2)}}{\\partial a^{(1)}} \\frac{\\partial a^{(1)}}{\\partial z^{(1)}} \\frac{\\partial z^{(1)}}{\\partial W^{(1)}}$$\n",
    "\n",
    "4. 隐藏层偏置的梯度:\n",
    "$$ \\frac{\\partial L}{\\partial b^{(1)}} = \\frac{\\partial L}{\\partial z^{(2)}} \\frac{\\partial z^{(2)}}{\\partial a^{(1)}} \\frac{\\partial a^{(1)}}{\\partial z^{(1)}} \\frac{\\partial z^{(1)}}{\\partial b^{(1)}}$$\n",
    "\n",
    "由此可更新每个参数,使loss函数L最小化。\n",
    "\n",
    "反向传播自动进行链式法则的梯度计算,实现了有效的深度网络训练。这是它被广泛采用的关键原因。\n",
    "\n",
    "\n",
    "* loss.backward()\n",
    "* net.h1.weight.grad "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor(3.),)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x=torch.tensor(1.,requires_grad=True)\n",
    "y=x**3\n",
    "torch.autograd.grad(y,x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Model0(nn.Module):\n",
    "    def __init__(self,in_features=10,out_features=2):\n",
    "        super(Model0,self).__init__()\n",
    "        self.h1=nn.Linear(3,3,bias=True)\n",
    "        self.h2=nn.Linear(3,2,bias=True)\n",
    "        self.out=nn.Linear(2,3,bias=True)\n",
    "\n",
    "    def forward(self, x):\n",
    "        h1_out=self.h1(x)\n",
    "        h1_out_r=torch.relu(h1_out)\n",
    "        h2_out=self.h2(h1_out_r)\n",
    "        h2_out_r=torch.relu(h2_out)\n",
    "        out_out=self.out(h2_out_r)\n",
    "        #out_out_s=torch.softmax(out_out,dim=1)\n",
    "        return out_out"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "x=torch.tensor([[1.0,2.0,1.5],[3.0,4.0,2.3],[5.0,6.0,1.2]])\n",
    "z=torch.tensor([0,2,1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "net=Model0(3,3)\n",
    "net(x)\n",
    "output=net.forward(x)\n",
    "#print(output)\n",
    "criterion = nn.CrossEntropyLoss()\n",
    "loss = criterion(output,z)#在PyTorch中,有些情况下目标向量(target)需要使用.long()方法转换为长整型tensor(long tensor),"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"20.png\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url= \"20.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 梯度下降算法\n",
    "\n",
    "\n",
    "### 梯度向量\n",
    "\n",
    "梯度向量的方向和大小确定方式可以用以下公式解释:\n",
    "\n",
    "假设有目标函数$f(\\mathbf{x})$,其中$\\mathbf{x}$是$n$维向量。\n",
    "\n",
    "则目标函数$f$相对于$\\mathbf{x}$的梯度是一个向量:\n",
    "\n",
    "$$\\nabla_{\\mathbf{x}} f = \\begin{bmatrix} \\frac{\\partial f}{\\partial x_1} \\\\ \\frac{\\partial f}{\\partial x_2} \\\\ \\vdots \\\\ \\frac{\\partial f}{\\partial x_n} \\end{bmatrix}$$\n",
    "\n",
    "其中:\n",
    "\n",
    "- $\\frac{\\partial f}{\\partial x_i}$表示$f$相对于$x_i$的偏导数\n",
    "\n",
    "- 梯度向量的每个元素表示目标函数相对于对应变量的变化率\n",
    "\n",
    "梯度向量方向性质:\n",
    "\n",
    "- 梯度方向与目标函数增长最快的方向相反\n",
    "\n",
    "- 沿负梯度方向移动可以使目标函数值下降最快\n",
    "\n",
    "梯度向量大小性质:\n",
    "\n",
    "- 梯度大小表示目标函数沿该方向的变化率\n",
    "\n",
    "- 梯度越大,目标函数越“陡峭”\n",
    "\n",
    "所以梯度向量充分表示了目标函数的局部变化,方向和大小的合理利用都有利于设计优化算法。\n",
    "\n",
    "\n",
    "\n",
    "### 对于权重参数的梯度下降的更新公式\n",
    "\n",
    "假设一个优化问题,其目标是最小化损失函数$L(\\omega)$,其中ω表示模型的参数。\n",
    "\n",
    "则梯度下降法的迭代更新公式为:\n",
    "\n",
    "$$\\omega^{(t+1)} = \\omega^{(t)} - \\eta \\nabla_{\\omega} L(\\omega^{(t)})$$\n",
    "\n",
    "其中:\n",
    "\n",
    "- $\\omega^{(t)}$: 模型参数在第$t$次迭代的值  \n",
    "- $\\eta$: 学习率,确定更新步长\n",
    "- $\\nabla_{\\omega} L(\\omega^{(t)})$: $L$相对于$\\omega$的梯度向量\n",
    "- $t$: 迭代次数\n",
    "\n",
    "梯度下降的迭代过程是:\n",
    "\n",
    "1. 初始化参数$\\omega$ \n",
    "\n",
    "2. 计算损失函数$L(\\omega)$关于参数$\\omega$的梯度$\\nabla_{\\omega} L(\\omega)$\n",
    "\n",
    "3. 沿负梯度方向更新参数,步长为学习率$\\eta$\n",
    "\n",
    "4. 重复2-3,直到损失函数收敛或达到预定迭代次数\n",
    "\n",
    "* net.h1.weight.grad\n",
    "* net.h1.weight.data\n",
    "* lr  dw  w  gamma\n",
    "* v = torch.zeros(dw.shape[0],dw.shape[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"21.png\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url= \"21.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 动量法梯度下降\n",
    "\n",
    "基于动量法(Momentum)的梯度下降可以使用动量项加速学习,提高收敛速度。其公式表达如下:\n",
    "\n",
    "假设参数为$\\omega$,损失函数为$J(\\omega)$,学习率为$\\eta$。\n",
    "\n",
    "基础梯度下降更新:\n",
    "\n",
    "$$\\omega_{t+1} = \\omega_{t} - \\eta \\nabla_\\omega J(\\omega_{t}) $$\n",
    "\n",
    "引入动量项$\\nu$,动量参数为$\\gamma$:\n",
    "\n",
    "$$\\nu_{t+1} = \\gamma \\nu_{t} + \\eta \\nabla_\\omega J(\\omega_{t})$$\n",
    "\n",
    "$$\\omega_{t+1} = \\omega_{t} - \\nu_{t+1}$$\n",
    "\n",
    "动量法的思想:\n",
    "\n",
    "- 使用$\\nu$存储历史梯度作为动量\n",
    "- $\\nu$平滑更新,抑制梯度震荡\n",
    "- $\\gamma$控制历史梯度的使用比例\n",
    "\n",
    "优点:\n",
    "\n",
    "- 加速学习,提高收敛速度\n",
    "- 有效抑制共线性等问题造成的震荡\n",
    "\n",
    "动量法利用历史梯度产生动能,帮助优化器逃离局部最优,是改进梯度下降的常用技巧。\n",
    "\n",
    "* net.h1.weight.grad\n",
    "* net.h1.weight.data\n",
    "* lr  dw  w  gamma\n",
    "* v = torch.zeros(dw.shape[0],dw.shape[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### torch.optim\n",
    "\n",
    "https://zhuanlan.zhihu.com/p/346205754\n",
    "\n",
    "* 基本用法\n",
    "\n",
    "优化器主要是在模型训练阶段对模型可学习参数进行更新, 常用优化器有 SGD，RMSprop，Adam等\n",
    "\n",
    "优化器初始化时传入传入模型的可学习参数，以及其他超参数如 lr，momentum等\n",
    "\n",
    "在训练过程中先调用 optimizer.zero_grad() 清空梯度，再调用 loss.backward() 反向传播，最后调用 optimizer.step()更新模型参数\n",
    "\n",
    "* 父类Optimizer 基本原理\n",
    "\n",
    "Optimizer 是所有优化器的父类，它主要有如下公共方法:\n",
    "\n",
    "add_param_group(param_group): 添加模型可学习参数组\n",
    "\n",
    "step(closure): 进行一次参数更新\n",
    "\n",
    "zero_grad(): 清空上次迭代记录的梯度信息\n",
    "\n",
    "state_dict(): 返回 dict 结构的参数状态\n",
    "\n",
    "load_state_dict(state_dict): 加载 dict 结构的参数状态\n",
    "\n",
    "* optim.SGD(net.parameters() , lr=lr , momentum = gamma)\n",
    "* opt.zero_grad()\n",
    "* loss.backward()\n",
    "* opt.step()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
